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Abstract In a previous work [physics/061108v2], it was shown that the volume spanned by a 
molecular system in its conformational space can be effectively bounded by a polyhedral cone, 
this cone is described by means of a simple combinatorial formula. On the other hand it was 
constructed a transversal graph structure encoding the region of conformational space accessible 
to the system. From the information in this graph, it is possible to decompose the main cone into 
a hierarchy of smaller ones that are more manageable, and are progressively more tightly bound 
to the region in which the system evolves. 
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I. Partition sequences 



A partition of molecular conformational space (thereafter referee! as CS) was defined and studied 
in [1-4] based on the notion of dominance partition sequence. A simple example will shows 
the basic ideas behind this concept. Let us have a simple four atom molecular system where an 
arbitrary order relation (numbering) has been defined on the set of atoms, and let their {x, y, z] 
coordinates be 
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from (1) we see that the following relations hold : 

x 2 < x 3 < x 4 < xi , y 4 < y 3 < y x < y 2 , z 4 < z 2 < z\ < z 3 (2) 

from relations (2) we say that, for example, coordinate x 3 dominates x 2 and is dominated by 
X4 and x\. Thus the molecular conformation defined by coordinate set (1) can be characterized 
by the following dominance partition sequence {DPS) 

{{(2)(3)(4)(1)} X , {(4)(3)(l)(2)} y , {(4)(2)(1)(3)},} (3) 

where the atom numbers in x, y and z are ordered as in (2). From (3) we can partition the (CS) 
of a molecular system into a set of discrete cells such that the conformations in a cell all have 
the same DPS, as was discussed in [1,4] cells in CS have the shape of polyhedral cones with the 
vertex at the origin. 

The formula (3) for DPSs has the peculiar characteristic that each index is enclosed between 
parenthesis, this is so beause in this way it can be extended to designate sets of continguous cells. 
Supose we have a conformation with DPS {...(i)(j)...} c , exchanging the coordinates q and cj 
results in a new conformation with DPS {...(j)(i)...} c that lies in an adjacent cell, thus the the 
notation for DPS in (3) can be extended to 

{■■•(< j)-}c (4) 

where (4) designates all the sequences that can be obtained by a permutation of the consecutive 
numbers i and j. 

This can be generalized [4] for any set of consecutive numbers 
{...{iii 2 ••• i n )-}c (5) 

where (5) designates any sequence obtained by permuting the consecutive numbers i\, i 2 ... i n . 
In what follows sequences like (3) will be designated as simple DPSs while (4) and (5) will be 
extended DPSs. 

An inclusion relation among DPSs can be defined : if V a and Vb are two DPSs and if S-p a and 
S-p b are the sets of simple sequences that they encode, then 

Definition 1. P a C ?i, if Sj> a C S-p b . 

with this definition extended DPSs amount to more than just sequences: let V be any DPS the 



set {S-p : W s C V => V s G Sp}, Sp is the partially ordered set (poset) associated to V [1,6,7]. 



II. Generalized partition sequences 

DPSs are very useful structures: with a minimum of code they allow to designate huge numbers of 
cells in CS. However, it was shown in [5] that if dominance partition sequences are to quantify the 
regions enclosing molecular dynamics trajectories they require one further level of codification: 
parenthesis in (3) and (5) are to be allowed to overlap. For example 

{(3 4 (8 9) 1 7)} (6) 

simultaneously encodes the extended DPSs {(3 4 8 9)(1 7)} and {(3 4)(1 7 8 9)}. As it can be 
seen from (6) pairs of parenthesis must bear an index so we can tell the beginning and the end. 
In order to distinguish (6) form ordinary DPSs the denomination of generalized dominance 
partition sequences (GDPS) was proposed in [5]. 

Coding sets of cells from conformational space as in (6) allows to define a cone in CS that wraps 
the region where the system evolves. For example, from a 2 ns molecular dynamics trajectory 
of a 58 residue protein structure [9]: the pancreatic trypsin inhibitor [8], using the dominance 
relations matrix from the figure 1 in [5], we have calculated the minimal GD PS-cone enclosing 
the a-carbons coordinates of the PTI 
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As stressed at the end of section I the importance of expressions like (7) lies in their associated 
poset: a hierarchical structure, because it allows a hierarchical decomposition of the CS region 
into sets of smaller cones and, more important, it facilitates the sorting of the graph of cells 
[3-5] which is the structure containing all the information about cells in CS. 

III. Merging cells from adjacent nodes in the compact graph of cells 

The graph of cells, or G is the fundamental structure of the present approach, it arose from 
the notion that the relative motions of small sets of atoms in the molecule can be thoroughly 
sampled in computer simulations. G encodes the set of cells in conformational space that can be 
accessed through the allowed combinations of these movements. 



This structure is constructed first by dividing the molecule into four-atom ordered sets whose 3D- 
structure is that of a simplex^, next for each simplex the visited cells in its CS are determined 
empirically from computer simulations. Then G is built as follows [3-4] 

• Each cell from a simplex is a node of G 

• The edges of the graph are between compatible nodes in adjacent simplexes: two cells from 
the CSs of two simplexes that share a face, or equivalently three atoms, are said to be 
compatible if their DPSs restricted to the numbers of the shared atoms are equal [5]. 

• For every pair of simplexes sharing a face, each cell from the CS of one simplex has an 
edge towards at least one cell from the other simplex, otherwise the structure would be 
geometrically inconsistent. 

Lemma 1. Given a simplex S £ G and a cell £i € S there is at least one other cell £2 £ <5 such 
that their respective DPSs differ only by a permutation of two numbers. 



{n Sl ,n 



S2 1 ! "S4 



t }, assume that we have two cells whose coordinate c DPSs are 



Let S 

{(n Sl )(n S2 )(n S3 )(n S4 )} c and {(n Sl )(n Si )(n S2 )(n S3 )} c respectively, the remaining coordinates be- 
ing equal, and that there is not a cell with sequence {(n Sl )(n S2 )(n S4 )(n S3 )} c , this is geometrically 
impossible because the coordinate c„ s cannot pass through c„ s without first going through c ns . 

There is a class of subgraphs of G called transversals such that they have a set of nodes consist- 
ing of one cell from every simplex and each cell has a edge towards every cell in adjacent simplexes. 
It was shown in [5] that each cell in a transversal is the projection of one cell from the CS of the 
molecule, thus G makes possible the enumeration of accessible cells in the conformational space 
of a molecule. 

G can be put in a compact form called C by recursively agregating sets of DPS in G into 
extended ones that contain them. For instance the cells from the simplex {9, 10, 14, 15} in G 
[4,5] 
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(15X14)}, 
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can be put in a compact form in C 

{{{(9)(10 15)(14)}* , {(15)(14)(10)(9)} s , {(10)(9 15)(14)},}, 



1 An irregular polytope with four vertices. 



1 



{{(9 10)(15)(14)} x , {(15)(14)(10)(9)} y , {(10)(9 15)(14)}J, 

{{(9)(10 15)(14)} a , {(15)(14)(10)(9)}„ , {(9 10)(14 15)}J, 

{{(9 10)(15)(14)} x , {(15)(14)(10)(9)}, , {(9 10)(14 15)},}} (9) 



The notion of compatibility can be extended to the cells in C: two cells in adjacent simplexes 
in C are said to be compatible if the dominance patterns that result from restricting their 
respective DPS sequences to the atom numbers from the shared face are equal. 



IV. The fragmentation of the cone 

The cone described in section II encloses the volume in CS that the system can access, this 
cone has been built by looking empirically in computer simulations at the range of each atom 
coordinate independently [5], so part of the volume is wasted as the construction of the cone does 
not take into account how the x, y and z coordinates are correlated. The graph C can supply 
the missing information allowing us to create cones that wrap more closely to the cells in it. 

We propose here an approach to the fragmentation of the cone that is based on the following set 
of heuristic rules: 

1. For a simplex S G C we scan the cell sequences for each coordinate independently: if for 
the coordinate c the partition sequences of two cells £i , £2 G 5 are such that Vi :C C Vi, c , 
then we set Vi, c '■= ^2,0- 

2. Let S = {n sl ,n S2 ,n S3 ,n Si } be a simplex in C and be {V x , c : 1 < x < N^} its set of cell 
sequences in the c coordinate. For 4 numbers there is a set of 24 possible simple DPSs, if 
every sequence from this set is contained in at least one V x>c , then Vx : 1 < x < we set 

P'X,C := {(^Sl n S2 ^"S3 ^-S4)}c- 

3. After performing the two previous steps for x, y and z redundant sequences are removed. 

As an example the set of cells for the simplex S = {3, 4, 5, 8} in C are 

{{{(5 8)(4)(3)} x , {(8)(4)(3 5)}, , {(8)(4 5)(3)}J, 

{{(5 8)(4)(3)} x , {(8)(4)(3 5)}, , {(8)(5)(3 4)},}, 

{{(5)(8)(3 4)}, , {(8)(4)(3 5)}, , {(8)(4 5)(3)},}, 

{{(5 8)(3 4)}, , {(8)(4 5X3)}, , {(8)(3 4 5)},}} (10) 

applying the above tranformation gives 

{{{(5 8)(3 A)} x , {(8)(4)(3 5)}, , {(8)(3 4 5)},}, 
{{(5 8)(3 4)}, , {(8)(4 5)(3)}, , {(8)(3 4 5)},}} (11) 

(11) is a cone that contains (10), in GDPS notation 

{{(5 8)(3 4)} x , {(8) (4 (5) 3)} w , {(8) (3 4 5)}J (12) 



This operation is performed in order to determine the minimal set of ordinary cones that contain 
the cells from a given simplex. 

One aim of this note is to show that the set of rules described above give meaninful results, a 
detailed study of the decomposition of (9) will be the subject of a further communication. Here, 
we show the above procedure at work on a simpler case, that of the first 13 a-carbons from our 
example molecule, they have the GDPS sequence 



{{(5 (8 (6 (9)11)7)(3 4 10)15 (12) (14)13)}, , 

1 12 23 34 45 566778 98 10 9 10 

{(15) (14) (13) (11 12M10M9H8) (7 (4) (5 6)3)}, , 

{(7 8 10 (6 9 11 13 (12) (15)14)3 4 5)} 2 } (13) 

We can see a total of 7 sequences of permuting numbers in x, 10 in y and 4 in z, our interest 
here is to see the realizable combinations of them obtained by merging compatible transformed 
sequences from different simplexes. 

A first example is from the £4 sequence that has 6 numbers in it (3 4 7 9 10 11), this means that 
the basic information about it, that is: how it connects with sequences in other dimensions, is 
contained in 15 simplexes, whose transformed DPSs are 
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obviously the sequences from these 15 cones are all compatible with those from adjacent simplexes, 
so they can all be merged to give the result 

{{(3 4 7 9 10 11)}, , {(11)(10)(9)(4 7)(3)}„ , {(7 9 10 11)(3 4)} 2 } (15) 

in which X4 appears to be connected with sequences z\ and Z4, or at least with fragments of 
them. Contrasting with (15) the sequence z±, of length 8, after merging up the DPSs from 70 
simplexes we end with a total of 7 cones 

{{(8 9)(6 7 11)(10)(12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)}„ , {(6 7 8 9 10 11 12 13)} 2 } 
{{(8 9 11)(6 7)(10)(12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)}„ , {(6 7 8 9 10 11 12 13)} 2 } 
{{(8)(6 7 9 11)(10)(12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)}„ , {(6 7 8 9 10 11 12 13)} 2 } 
{{(8 11)(6 7 9)(10)(12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)}„ , {(6 7 8 9 10 11 12 13)} 2 } 
{{(6 8 11)(7 9 10) (12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)} w , {(6 7 8 9 10 11 12 13)} 2 } 
{{(6 8)(7 9 10 11) (12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)}„ , {(6 7 8 9 10 11 12 13)} 2 } 
{{(6 8 9 11)(7 10) (12)(13)}, , {(13)(11 12)(7)(10)(9)(8)(7)(6)} 1/ , {(6 7 8 9 10 11 12 13)} 2 } 

z\ appears to be connected with X2, X3 and x&, but there is also 4 cones that combine with 
fragments from the main sequences. This reveals details of how 3D structures are organized 
inside the main cone: only some of the DPSs in one dimension connect with those from another 
dimension. The exception in this example are the segments from the y dimension: there is so few 
structure in them that they appear to combine with any other segment from x and y. 



The remaining segments xi, X5 and Z4 give 



{{(5 6 8 9)}, , {(9)(8)(5 6)}, , {(6 8 9)(5)} 2 } 

{{(3 4 10 12 15)}, , {(15)(12)(10)(4)(3)}„ , {(10 12 15)(3 4)},} 

{{(5)(3 4 15)(14)} x , {(15)(14)(4)(3 5)}, ,{(3 4 5 14 15)},} 
{{(5)(3 4 15)(14)} x , {(15)(14)(4 5)(3)} y ,{(3 4 5 14 15)},} 



(*i) 



(*) 



VI. Conclusion 



The GDPS (7) is a global approximation that sets the bounds of the region from CS in which 
the system evolves. These bounds are set independently for the x, y and z coordinates of the 
molecule, and consequently they are not correlated: this means that much of the volume enclosed 
by (7) does not correspond to realizable 3Z)-structures. 

On the other hand the graph of cells G allows to exactly enumerate the set of visited cells in 
conformational space, but this possibility is probably algorithmically hopeless. 

The approach presented in this paper consists in using the information contained in G to derive 
bounds from (7) that are correlated in x, y and z, and to progressively narrow these bounds 
around interesting regions. 

The importance of a structure like (7), and the DPSs that are derived from them, is not only 
the precision they can attain in delimiting conformational space, but the fact that they possess a 
graphical structure and connected graphs always have a metric: we can measure the distance 
between two nodes as the length of the minimal path that joins them. This opens the very real 
possibility of measuring and enumerating distances between points and conformations, so that a 
combinatorial hamiltonian can be built on these structures. And this should be the next phase 
of the present work. 
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